Statistical Modelling of COVID-19 Outbreak in Italy

06 Apr 2020



Nonlinear growth models

Nonlinear growth models represent an instance of nonlinear regression models, a class of models taking the general form \[ y = \mu(x, \theta) + \epsilon, \] where \(\mu(x, \theta)\) is the mean function which depends on a possibly vector-valued parameter \(\theta\), and a possibly vector-valued predictor \(x\). The stochastic component \(\epsilon\) represents the error with mean zero and constant variance. Usually, a Gaussian distribution is also assumed for the error term.

By defining the mean function \(\mu(x, \theta)\) we may obtain several different models, all characterized by the fact that parameters \(\theta\) enter in a nonlinear way into the equation. Parameters are usually estimated by nonlinear least squares which aims at minimizing the residual sum of squares.

Exponential

\[ \mu(x) = \theta_1 \exp\{\theta_2 x\} \] where \(\theta_1\) is the value at the origin (i.e. \(\mu(x=0)\)), and \(\theta_2\) represents the (constant) relative ratio of change (i.e. \(\frac{d\mu(x)}{dx }\frac{1}{\mu(x)} = \theta_2\)). Thus, the model describes an increasing (exponential growth if \(\theta_2 > 0\)) or decreasing (exponential decay if \(\theta_2 < 0\)) trend with constant relative rate.

Logistic

\[ \mu(x) = \frac{\theta_1}{1+\exp\{(\theta_2 - x)/\theta_3\}} \] where \(\theta_1\) is the upper horizontal asymptote, \(\theta_2\) represents the x-value at the inflection point of the symmetric growth curve, and \(\theta_3\) represents a scale parameter (and \(1/\theta_3\) is the growth-rate parameter that controls how quickly the curve approaches the upper asymptote).

Gompertz

\[ \mu(x) = \theta_1 \exp\{-\theta_2 \theta_3^x\} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the value of the function at \(x = 0\) (displacement along the x-axis), and \(\theta_3\) represents a scale parameter.

The difference between the logistic and Gompertz functions is that the latter is not symmetric around the inflection point.

Richards

\[ \mu(x) = \theta_1 (1 - \exp\{-\theta_2 x\})^{\theta_3} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the rate of growth, and \(\theta_3\) in part determines the point of inflection on the y-axis.

Data

Dipartimento della Protezione Civile: COVID-19 Italia - Monitoraggio della situazione http://arcg.is/C1unv

Source: https://github.com/pcm-dpc/COVID-19

url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv"
COVID19 <- read.csv(file = url, stringsAsFactors = FALSE)
COVID19$data <- as.Date(COVID19$data)
# DT::datatable(COVID19)

Warnings

- 29/03/2020: dati Regione Emilia Romagna parziali (dato tampone non aggiornato).
- 26/03/2020: dati Regione Piemonte parziali (-50 deceduti - comunicazione tardiva)
- 18/03/2020: dati Regione Campania non pervenuti.
- 18/03/2020: dati Provincia di Parma non pervenuti.
- 17/03/2020: dati Provincia di Rimini non aggiornati
- 16/03/2020: dati P.A. Trento e Puglia non pervenuti.
- 11/03/2020: dati Regione Abruzzo non pervenuti.
- 10/03/2020: dati Regione Lombardia parziali.
- 07/03/2020: dati Brescia +300 esiti positivi


Modelling total infected

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$totale_casi)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##        Estimate  Std. Error t value      Pr(>|t|)    
## th1 6199.742591  836.598772   7.411 0.00000000436 ***
## th2    0.074599    0.003606  20.686       < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9592 on 41 degrees of freedom
## 
## Number of iterations to convergence: 10 
## Achieved convergence tolerance: 0.000006833

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##         Estimate  Std. Error t value Pr(>|t|)    
## Asym 145749.4585   1656.3025   88.00   <2e-16 ***
## xmid     30.6739      0.1836  167.04   <2e-16 ***
## scal      5.9592      0.0984   60.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1324 on 40 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000004902

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
#            control = nls.control(maxiter = 1000))
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##           Estimate    Std. Error t value Pr(>|t|)    
## Asym 198863.199939   3924.344773   50.67   <2e-16 ***
## b2        9.922099      0.275274   36.04   <2e-16 ***
## b3        0.928099      0.001465  633.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1013 on 40 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.0000007598

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, algorithm = "plinear", 
           control = nls.control(maxiter = 1000, tol = 0.1))
# algorithm is not converging... 
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##          Estimate    Std. Error t value Pr(>|t|)    
## th1 218049.238346   7981.618888   27.32   <2e-16 ***
## th2      0.061956      0.002685   23.07   <2e-16 ***
## th3      6.782691      0.369165   18.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1309 on 40 degrees of freedom
## 
## Number of iterations to convergence: 4 
## Achieved convergence tolerance: 0.007574
# library(nlmrt)
# mod4 = nlxb(y ~ th1*(1 - exp(-th2*x))^th3, 
#             data = data, start = start, trace = TRUE)

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -454.2454 3 0.9621979 914.4907 915.1061 919.7743
Logistic model -368.5541 4 0.9992898 745.1083 746.1609 752.1531
Gompertz model -357.0509 4 0.9995810 722.1018 723.1545 729.1466 ***
Richards model -368.0754 4 0.9993424 744.1508 745.2034 751.1956
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 10000),
                     minor_breaks = seq(0, coef(mod2)[1], by = 5000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(100,NA)) +
  labs(y = "Infected (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,c("fit2", "fit3")]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 10000),
                     minor_breaks = seq(0, max(ylim), by = 5000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date    fit    lwr    upr
## 44  2020-04-07 165155 137626 194985
## 441 2020-04-07 131678 128366 134285
## 442 2020-04-07 137064 134700 140120
## 443 2020-04-07 137747 134223 141923

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 10000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Modelling total deceased

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$deceduti)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##       Estimate Std. Error t value     Pr(>|t|)    
## th1 419.963446  61.435932   6.836 0.0000000281 ***
## th2   0.088578   0.003823  23.171      < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 998.2 on 41 degrees of freedom
## 
## Number of iterations to convergence: 11 
## Achieved convergence tolerance: 0.000002516

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## Asym 19276.6852   241.9449   79.67   <2e-16 ***
## xmid    33.5410     0.1780  188.39   <2e-16 ***
## scal     5.5528     0.0845   65.71   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 138.7 on 40 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000000919

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# manually set starting values
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start, 
#            control = nls.control(maxiter = 10000))
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##          Estimate   Std. Error t value Pr(>|t|)    
## Asym 29290.512243   651.530700   44.96   <2e-16 ***
## b2      13.112232     0.362879   36.13   <2e-16 ***
## b3       0.929286     0.001339  694.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 97.97 on 40 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.0000001946

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, algorithm = "port", 
           control = nls.control(maxiter = 1000))
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##         Estimate   Std. Error t value Pr(>|t|)    
## th1 32295.944403  1222.808083   26.41   <2e-16 ***
## th2     0.062505     0.002289   27.31   <2e-16 ***
## th3     9.297904     0.463054   20.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 121.2 on 40 degrees of freedom
## 
## Number of iterations to convergence: 8 
## Achieved convergence tolerance: 0.00000966

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -356.9446 3 0.9732045 719.8892 720.5046 725.1728
Logistic model -271.5478 4 0.9995193 551.0956 552.1482 558.1404
Gompertz model -256.6009 4 0.9997169 521.2018 522.2544 528.2466 ***
Richards model -265.7410 4 0.9995838 539.4820 540.5347 546.5268
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
                     minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(10,NA)) +
  labs(y = "Deceased (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date   fit   lwr   upr
## 44  2020-04-07 20694 17845 23964
## 441 2020-04-07 16733 16317 17078
## 442 2020-04-07 17409 17174 17695
## 443 2020-04-07 17476 17171 17834

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Modelling recovered

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$dimessi_guariti)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##       Estimate Std. Error t value       Pr(>|t|)    
## th1 467.190189  54.985377   8.497 0.000000000139 ***
## th2   0.092952   0.003058  30.397        < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1003 on 41 degrees of freedom
## 
## Number of iterations to convergence: 11 
## Achieved convergence tolerance: 0.000002176

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## Asym 30906.6083   867.3091   35.63   <2e-16 ***
## xmid    36.5392     0.4012   91.08   <2e-16 ***
## scal     6.3266     0.1437   44.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 264.8 on 40 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.00000861

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##          Estimate   Std. Error t value Pr(>|t|)    
## Asym 65035.229252  3725.492431   17.46   <2e-16 ***
## b2      10.039705     0.265020   37.88   <2e-16 ***
## b3       0.948472     0.001704  556.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 178.3 on 40 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000002517

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, # algorithm = "port", 
           control = nls.control(maxiter = 1000))
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##         Estimate   Std. Error t value           Pr(>|t|)    
## th1 98794.370720 12855.709038   7.685 0.0000000021243871 ***
## th2     0.034480     0.002978  11.579 0.0000000000000239 ***
## th3     5.617545     0.333531  16.843            < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195.4 on 40 degrees of freedom
## 
## Number of iterations to convergence: 13 
## Achieved convergence tolerance: 0.000001075

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -357.1480 3 0.9848054 720.2960 720.9114 725.5796
Logistic model -299.3563 4 0.9988625 606.7126 607.7652 613.7574
Gompertz model -282.3540 4 0.9994236 572.7081 573.7607 579.7529 ***
Richards model -286.2923 4 0.9993380 580.5847 581.6373 587.6295
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
                     minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(10,NA)) +
  labs(y = "Recovered (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() + 
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date   fit   lwr   upr
## 44  2020-04-07 27907 25284 31168
## 441 2020-04-07 23638 22812 24411
## 442 2020-04-07 24432 23969 24955
## 443 2020-04-07 24583 24153 25146

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 5000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Description of evolution

Positive cases and administered swabs

df = data.frame(date = COVID19$data,
                positives = c(NA, diff(COVID19$totale_casi)),
                swabs = c(NA, diff(COVID19$tamponi)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
# df$y = df$positives/df$swabs
df$y = df$positives/c(NA, zoo::rollmean(df$swabs, 2))
df = subset(df, swabs > 50)
# DT::datatable(df[,-4], )
ggplot(df, aes(x = date)) + 
  geom_point(aes(y = swabs, color = "swabs"), pch = 19) +
  geom_line(aes(y = swabs, color = "swabs")) +
  geom_point(aes(y = positives, color = "positives"), pch = 0) +
  geom_line(aes(y = positives, color = "positives")) +
  labs(x = "", y = "Number of cases", color = "") +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = palette()[c(2,1)]) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

ggplot(df, aes(x = date, y = y)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col=palette()[4]) + 
  geom_line(size = 0.5, col=palette()[4]) +
  labs(x = "", y = "% positives among admnistered swabs (two-day rolling mean)") +
  scale_y_continuous(labels = scales::percent_format(),
                     breaks = seq(0, 0.5, by = 0.05)) +
  coord_cartesian(ylim = c(0,max(df$y, na.rm = TRUE))) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Hospitalized and ICU patients

df = data.frame(date = COVID19$data,
                hospital = c(NA, diff(COVID19$totale_ospedalizzati)),
                icu = c(NA, diff(COVID19$terapia_intensiva)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
ggplot(df, aes(x = date, y = hospital)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col = "orange") + 
  geom_line(size = 0.5, col = "orange") +
  labs(x = "", y = "Change hospitalized patients") +
  coord_cartesian(ylim = range(df$hospital, na.rm = TRUE)) +
  scale_y_continuous(minor_breaks = seq(min(df$hospital, na.rm = TRUE),
                                        max(df$hospital, na.rm = TRUE), 
                                        by = 100)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

ggplot(df, aes(x = date, y = icu)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col = "red2") + 
  geom_line(size = 0.5, col = "red2") +
  labs(x = "", y = "Change ICU patients") +
  coord_cartesian(ylim = range(df$icu, na.rm = TRUE)) +
  scale_y_continuous(minor_breaks = seq(min(df$icu, na.rm = TRUE), 
                                        max(df$icu, na.rm = TRUE), 
                                        by = 10)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))